571 research outputs found
Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data
Determining the functional structure of biological networks is a central goal
of systems biology. One approach is to analyze gene expression data to infer a
network of gene interactions on the basis of their correlated responses to
environmental and genetic perturbations. The inferred network can then be
analyzed to identify functional communities. However, commonly used algorithms
can yield unreliable results due to experimental noise, algorithmic
stochasticity, and the influence of arbitrarily chosen parameter values.
Furthermore, the results obtained typically provide only a simplistic view of
the network partitioned into disjoint communities and provide no information of
the relationship between communities. Here, we present methods to robustly
detect coregulated and functionally enriched gene communities and demonstrate
their application and validity for Escherichia coli gene expression data.
Applying a recently developed community detection algorithm to the network of
interactions identified with the context likelihood of relatedness (CLR)
method, we show that a hierarchy of network communities can be identified.
These communities significantly enrich for gene ontology (GO) terms, consistent
with them representing biologically meaningful groups. Further, analysis of the
most significantly enriched communities identified several candidate new
regulatory interactions. The robustness of our methods is demonstrated by
showing that a core set of functional communities is reliably found when
artificial noise, modeling experimental noise, is added to the data. We find
that noise mainly acts conservatively, increasing the relatedness required for
a network link to be reliably assigned and decreasing the size of the core
communities, rather than causing association of genes into new communities.Comment: Due to appear in PLoS Computational Biology. Supplementary Figure S1
was not uploaded but is available by contacting the author. 27 pages, 5
figures, 15 supplementary file
Patterns of subnet usage reveal distinct scales of regulation in the transcriptional regulatory network of Escherichia coli
The set of regulatory interactions between genes, mediated by transcription
factors, forms a species' transcriptional regulatory network (TRN). By
comparing this network with measured gene expression data one can identify
functional properties of the TRN and gain general insight into transcriptional
control. We define the subnet of a node as the subgraph consisting of all nodes
topologically downstream of the node, including itself. Using a large set of
microarray expression data of the bacterium Escherichia coli, we find that the
gene expression in different subnets exhibits a structured pattern in response
to environmental changes and genotypic mutation. Subnets with less changes in
their expression pattern have a higher fraction of feed-forward loop motifs and
a lower fraction of small RNA targets within them. Our study implies that the
TRN consists of several scales of regulatory organization: 1) subnets with more
varying gene expression controlled by both transcription factors and
post-transcriptional RNA regulation, and 2) subnets with less varying gene
expression having more feed-forward loops and less post-transcriptional RNA
regulation.Comment: 14 pages, 8 figures, to be published in PLoS Computational Biolog
Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution
The standard approach to analyzing 16S tag sequence data, which relies on
clustering reads by sequence similarity into Operational Taxonomic Units
(OTUs), underexploits the accuracy of modern sequencing technology. We present
a clustering-free approach to multi-sample Illumina datasets that can identify
independent bacterial subpopulations regardless of the similarity of their 16S
tag sequences. Using published data from a longitudinal time-series study of
human tongue microbiota, we are able to resolve within standard 97% similarity
OTUs up to 20 distinct subpopulations, all ecologically distinct but with 16S
tags differing by as little as 1 nucleotide (99.2% similarity). A comparative
analysis of oral communities of two cohabiting individuals reveals that most
such subpopulations are shared between the two communities at 100% sequence
identity, and that dynamical similarity between subpopulations in one host is
strongly predictive of dynamical similarity between the same subpopulations in
the other host. Our method can also be applied to samples collected in
cross-sectional studies and can be used with the 454 sequencing platform. We
discuss how the sub-OTU resolution of our approach can provide new insight into
factors shaping community assembly.Comment: Updated to match the published version. 12 pages, 5 figures +
supplement. Significantly revised for clarity, references added, results not
change
Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection
In microarray gene expression data analysis, it is often of interest to identify genes that share similar expression profiles with a particular gene such as a key regulatory protein. Multiple studies have been conducted using various correlation measures to identify co-expressed genes. While working well for small datasets, the heterogeneity introduced from increased sample size inevitably reduces the sensitivity and specificity of these approaches. This is because most co-expression relationships do not extend to all experimental conditions. With the rapid increase in the size of microarray datasets, identifying functionally related genes from large and diverse microarray gene expression datasets is a key challenge. We develop a model-based gene expression query algorithm built under the Bayesian model selection framework. It is capable of detecting co-expression profiles under a subset of samples/experimental conditions. In addition, it allows linearly transformed expression patterns to be recognized and is robust against sporadic outliers in the data. Both features are critically important for increasing the power of identifying co-expressed genes in large scale gene expression datasets. Our simulation studies suggest that this method outperforms existing correlation coefficients or mutual information-based query tools. When we apply this new method to the Escherichia coli microarray compendium data, it identifies a majority of known regulons as well as novel potential target genes of numerous key transcription factors
A Relative Variation-Based Method to Unraveling Gene Regulatory Networks
Gene regulatory network (GRN) reconstruction is essential in understanding the functioning and pathology of a biological system. Extensive models and algorithms have been developed to unravel a GRN. The DREAM project aims to clarify both advantages and disadvantages of these methods from an application viewpoint. An interesting yet surprising observation is that compared with complicated methods like those based on nonlinear differential equations, etc., methods based on a simple statistics, such as the so-called -score, usually perform better. A fundamental problem with the -score, however, is that direct and indirect regulations can not be easily distinguished. To overcome this drawback, a relative expression level variation (RELV) based GRN inference algorithm is suggested in this paper, which consists of three major steps. Firstly, on the basis of wild type and single gene knockout/knockdown experimental data, the magnitude of RELV of a gene is estimated. Secondly, probability for the existence of a direct regulation from a perturbed gene to a measured gene is estimated, which is further utilized to estimate whether a gene can be regulated by other genes. Finally, the normalized RELVs are modified to make genes with an estimated zero in-degree have smaller RELVs in magnitude than the other genes, which is used afterwards in queuing possibilities of the existence of direct regulations among genes and therefore leads to an estimate on the GRN topology. This method can in principle avoid the so-called cascade errors under certain situations. Computational results with the Size 100 sub-challenges of DREAM3 and DREAM4 show that, compared with the -score based method, prediction performances can be substantially improved, especially the AUPR specification. Moreover, it can even outperform the best team of both DREAM3 and DREAM4. Furthermore, the high precision of the obtained most reliable predictions shows that the suggested algorithm may be very helpful in guiding biological experiment designs
Analysis of among-site variation in substitution patterns
Substitution patterns among nucleotides are often assumed to be constant in phylogenetic analyses. Although variation in the average rate of substitution among sites is commonly accounted for, variation in the relative rates of specific types of substitution is not. Here, we review details of methodologies used for detecting and analyzing differences in substitution processes among predefined groups of sites. We describe how such analyses can be performed using existing phylogenetic tools, and discuss how new phylogenetic analysis tools we have recently developed can be used to provide more detailed and sensitive analyses, including study of the evolution of mutation and substitution processes. As an example we consider the mitochondrial genome, for which two types of transition deaminations (C⇒T and A⇒G) are strongly affected by single-strandedness during replication, resulting in a strand asymmetric mutation process. Since time spent single-stranded varies along the mitochondrial genome, their differential mutational response results in very different substitution patterns in different regions of the genome
The Escherichia coli transcriptome mostly consists of independently regulated modules
Underlying cellular responses is a transcriptional regulatory network (TRN) that modulates gene expression. A useful description of the TRN would decompose the transcriptome into targeted effects of individual transcriptional regulators. Here, we apply unsupervised machine learning to a diverse compendium of over 250 high-quality Escherichia coli RNA-seq datasets to identify 92 statistically independent signals that modulate the expression of specific gene sets. We show that 61 of these transcriptomic signals represent the effects of currently characterized transcriptional regulators. Condition-specific activation of signals is validated by exposure of E. coli to new environmental conditions. The resulting decomposition of the transcriptome provides: a mechanistic, systems-level, network-based explanation of responses to environmental and genetic perturbations; a guide to gene and regulator function discovery; and a basis for characterizing transcriptomic differences in multiple strains. Taken together, our results show that signal summation describes the composition of a model prokaryotic transcriptome
Origin of Co-Expression Patterns in E.coli and S.cerevisiae Emerging from Reverse Engineering Algorithms
BACKGROUND: The concept of reverse engineering a gene network, i.e., of inferring a genome-wide graph of putative gene-gene interactions from compendia of high throughput microarray data has been extensively used in the last few years to deduce/integrate/validate various types of "physical" networks of interactions among genes or gene products. RESULTS: This paper gives a comprehensive overview of which of these networks emerge significantly when reverse engineering large collections of gene expression data for two model organisms, E. coli and S. cerevisiae, without any prior information. For the first organism the pattern of co-expression is shown to reflect in fine detail both the operonal structure of the DNA and the regulatory effects exerted by the gene products when co-participating in a protein complex. For the second organism we find that direct transcriptional control (e.g., transcription factor-binding site interactions) has little statistical significance in comparison to the other regulatory mechanisms (such as co-sharing a protein complex, co-localization on a metabolic pathway or compartment), which are however resolved at a lower level of detail than in E. coli. CONCLUSION: The gene co-expression patterns deduced from compendia of profiling experiments tend to unveil functional categories that are mainly associated to stable bindings rather than transient interactions. The inference power of this systematic analysis is substantially reduced when passing from E. coli to S. cerevisiae. This extensive analysis provides a way to describe the different complexity between the two organisms and discusses the critical limitations affecting this type of methodologies
- …